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Abstract 

Background: The classical MIKC-type MADS-box transcription factors comprise one gene family that plays 
diverse roles in the flowering process ranging from floral initiation to the development of floral organs. Despite 
their importance in regulating developmental processes that impact crop yield, they remain largely unexplored in 
the major legume oilseed crop, soybean. 

Results: We identified 57 MIKC^-type transcription factors from soybean and determined the in silico gene 
expression profiles of the soybean MIKC^-type genes across different tissues. Our study implicates three MIKC^-type 
transcription factors as novel members of the AGAMOUS LIKE 6 (AGL6) subfamily of the MIKC^-type MADS-box 
genes, and we named this sister clade PsMADSS. While similar genes were identified in other legume species, 
poplar and grape, no such gene is represented in Arabidopsis thaliana or rice. RT-PCR analysis on these three 
soybean PsMADSS genes during early floral initiation processes revealed their temporal expression similar to that of 
APETALAl, a gene known to function as a floral meristem identity gene. However, RNA in situ hybridisation showed 
that their spatial expression patterns are markedly different from those of APETALAl. 

Conclusion: Legume flower development system differs from that in the model plant, Arabidopsis. There is an 
overlap in the initiation of different floral whorls in soybean, and inflorescent meristems can revert to leaf 
production depending on the environmental conditions. MIKC^-type MADS-box genes have been shown to play 
key regulatory roles in different stages of flower development. We identified members of the PsMADSS sub-clade in 
legumes that show differential spatial expression during floral initiation, indicating their potential novel roles in the 
floral initiation process. The results from this study will contribute to a better understanding of legume-specific 
floral developmental processes. 
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Background 

Flower development in plants involves tightly regulated 
processes starting from floral initiation to flower for- 
mation. The underlying processes have been extensively 
investigated, as flower development is an important 
agronomic trait that determines crop yield. Various tran- 
scription factors are essential in regulating these devel- 
opmental processes, including the family of MADS-box 
transcription factors. 
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The MADS-box transcription factors, especially the 
plant-specific classical (^) MIKC-type MADS-box genes, 
are known to play key regulatory roles in different stages 
of flower development. Their roles in coordinating floral 
developmental processes have been revealed by func- 
tional studies largely carried out in the model plant, 
Arabidopsis thaliana. The MIKC^-type genes are cha- 
racterised by a conserved structural organisation of the 
MADS-box, Intervening-, Keratin-like- and C-domains. 
The highly conserved MADS-domain and the weakly 
conserved I-domain are required for DNA binding, while 
the strongly conserved K-domain and the variable C- 
domain regulate protein interactions [1]. 
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Genome-wide analyses of the MIKC -type genes have 
been carried out in Arabidopsis [2], rice [3] and poplar 
[4]. While Arabidopsis and rice genomes have similar 
numbers of MIKC^-type genes (39 vs. 38), poplar has 55 
of these genes, suggesting a higher birth rate compared 
to Arabidopsis or rice. These MIKC^-type genes can be 
divided into 15 distinct gene clades with each clade 
named after the first member identified [3,5]. All but 
two (TM8 and OsMADS32) are found in Arabidopsis 
[3,5], and the FLC clade may be absent from the rice 
genome [3]. It remains unclear whether all clades are 
present in the poplar genome, as no TM8 genes were 
used in the phylogenetic analysis [4]. 

The SQUA subfamily clade includes four Arabidopsis 
members, APETALAl (API), CAULIFLOWER (CAL), 
FRUITFULL (FUL) and AGAMOUS-LIKE 79 (AGL79) 
[2]. The functions of API, CAL and FUL have been 
characterised, indicating that they play partially redun- 
dant roles in determining floral meristem identity [6]. 
The SEPALLATA (SEP) family belongs to the AGL2 
clade, and there are four members documented in Ara- 
bidopsis [7]. All four members (SEPl, SEP2, SEP3 and 
SEP4) play redundant functions in determining floral 
organ identity and floral meristem determinacy. API has 
been shown to bind directly to the SEP3 promoter, 
hence increasing the expression of SEP3 rapidly [8]. The 
AGL6 subfamily has a relatively small representation 
(only two genes, AGL6 and its paralog AGL13) in Ara- 
bidopsis. While no knockout phenotype has been de- 
scribed for either of these genes in Arabidopsis, studies 
in rice, maize and Petunia hybrida have largely demon- 
strated the roles of the AGL6 subfamily in regulating 
floral organ identity and floral meristem determinacy, in- 
dicating redundant roles with closely related genes in- 
cluding SEP [9-11]. A phylogenetic study showed that 
subfamilies of SQUA, SEP and AGL6 are always roo- 
ted together in one superclade, which may be corre- 
lated with their overlapping roles in regulating flower 
development. 

A total of 212 MADS -box genes were predicted in the 
recent genome sequence of soybean [12]. Earlier we 
reported the diversification of some gene expression and 
microRNAs in legume SAM [13-16]. However, much re- 
mains to be learned about these genes, especially given 
their potential impact on crop production. Soybean is 
the largest legume crop in the world and accounts for 
greater than 50% of the global oilseed production. In this 
study, we identified all the soybean MIKC*^-type MADS- 
box genes using the current Glymal.O gene set and 
identified potential phylogenetic relationships to their 
Arabidopsis, rice and poplar counterparts. We examined 
the expression patterns across different soybean tissues 
for the entire family. Intriguingly, the results revealed 
a novel AGL6 sister clade of MIKC^-type genes in 



soybean, and we focused our subsequent analysis on 
members of this novel sub-clade. 

Results and discussion 

Molecular evolutionary analysis of soybean MIKC^-type 
MADS-box transcription factors 

When we searched the soybean predicted gene set avai- 
lable at Phytozome [12] for sequences containing both a 
MADS-box and K-domain, we identified a total of 57 
sequences. Subsequent inspection revealed three of the 
sequences were incomplete. We attempted to obtain a 
full-length sequence for these genes using gene pre- 
diction software on the genome sequence surrounding 
these partial sequences but did not yield any results. 
Therefore, we omitted these sequences from further ana- 
lysis. To investigate their phylogenetic relationships with 
MIKC^-type genes from Arabidopsis, rice and poplar, 
reported MIKC*^ group protein sequences [3-5] were re- 
trieved from their respective databases. A total of 159 
conceptually translated protein sequences were used in 
the phylogenetic analysis. 

Fifteen existing clades were identified from the gene- 
rated phylogenetic tree: AGL2, AGL6, SQUA, AGL12, 
FLC, TM3, AGL17, AG, OsMADS32, TM8, STMADSll, 
AGL15, GGM13, DEF, and GLO (Figure 1). The rela- 
tionships among the different clades are similar to other 
reports [2-5]. With the exception of OsMADS32 and 
TM8, soybean genes are found in all clades, and the 
number of soybean sequences within each clade varies 
from one (AGL15 and AGL17) to nine (SQUAl), with 
most genes occurring in duplicate. Consistent with a 
previous report [5], the genes in AGL2, SQUA and 
AGL6 form a superclade together (Figure 1). Within the 
AGL6 clade, a strongly supported internal branch seems 
to be separate from the existing AGL6 members, sug- 
gesting it is a novel sister clade not represented in Ara- 
bidopsis or rice. This novel sub-clade consists of three 
soybean genes, Glymal6g32540 and a homeolog pair, 
GlymalOg38580 (GmMADS3a) and Glyma20g29250 
(GmMADS3b). The top BLASTX match of these genes 
is a MADS-box transcription factor from garden pea, 
which is annotated as PsMADS3 [17] (data not shown). 
Therefore, we named it the PsMADS3 sub-clade. 

We next examined whether similar orthologous mem- 
bers of PsMADS3 exist in species other than soybean, 
garden pea and poplar. A BLASTP search at the NCBI 
public database identified potential PsMADS3 sequences 
from other species (data not shown). Using the top 
matching 40 orthologs including sequences from the sis- 
ter clade, AGL6, for phylogenetic analysis, three distinct 
groups were identified in the tree rooted with gymno- 
sperm AGL6 as the outgroup (Figure 2). One clade 
groups all of the monocot sequences, whereas the other 
two clades contain AGL6 and PsMADS3 sequences 
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Figure 1 Relationships among representative members of 
MIKC*^-type genes, a. The phylogenetic tree was based on MUSCLE 
alignments of conceptual protein sequences spanning the MADS-, 
I- and K-domains of sequences from Arabidopsis, rice, poplar and 
soybean. The unrooted bootstrap consensus tree was constructed 
using the Maximum Likelihood method implemented in Mega 5. 
The number for each node is the bootstrap percentages (200 
replications), and nodes with less than 50% bootstrap values were 
collapsed. Fifteen clades of MIKC^-type genes are indicated, with 
PsMADSS potentially representing a novel sister clade of AGL6. 



(Figure 2). As we were only using sequences available in 
the public database, it is likely that similar PsMADSS se- 
quences exist in species other than those examined. Fu- 
ture studies with increased taxon sampling will help to 
clarify the node. Members of the PsMADSS sub-clade 
are represented in not only legume species but also pop- 
lar and grape (Figure 2). As the genome sequences for 
Arabidopsis and rice are available and well annotated, 
we are confident that members of this PsMADSS sub- 
clade are absent from these two species. Furthermore, 
the observation that no orthologous PsMADSS genes is 
found among the top matching sequences from other 
monocots including wheat and maize (Figure 2) implies 
that such genes may be absent from monocot. The 
PsMADSS clade may have evolved after monocot-dicot 
divergence and following the emergence of the Arabi- 
dopsis species. These genes may have been overlooked 
previously due to their absence in Arabidopsis and rice. 
In fact, none of these legume genes were analysed in a 
recent study on eudicot AGL6 [18]. 

In silico analysis of soybean MIKC^ gene expression 
patterns 

We previously performed high-throughput RNA-se- 
quencing on micro-dissected shoot apical meristems 
(SAMs) undergoing the early floral initiation process 
[29]. The samples were derived from soybean plants (10- 
day-old) subjected to different lengths of short-day (SD) 
treatments (OSD, ISD, 2SD, SSD and 4SD), and the in- 
duction of the floral meristem identity genes including 
GmAPl occurred on 4SD. Because genes that are in- 
volved in later processes of flowering such as floral 
organ development or only expressed in other tissues 
such as roots may not be captured by our dataset, we 
also used the transcriptome data reported by Libault 
and co-workers [19] to include a diverse range of tis- 
sues including flower, seed pod, root, nodule and root 
tip in our in silico analysis (Figure S). 

All identified soybean MIKC^-genes are expressed in 
at least one of the three reproductive tissues represented 
(reproductive SAM, flower and pod), except for mem- 
bers of the AGL12 clade, which are only expressed in 
the root (Figure S). Arabidopsis AGL12 is preferentially 
expressed in the root, and recent loss-of-function 
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Figure 2 Phylogeny of AG L6 genes. The tree was produced as in Figure 1 but from complete protein sequences. Tine predicted peptide 
sequence of GmMADSSa (Glymal0g38560) was used for a BLASTP searcli at NCBI. Tine top 40 matclies (E value < le-50) were downloaded from 
NCBI and filtered for duplicated or short sequences prior to phylogenetic analysis. The tree was rooted by using the gymnosperm AGL6 as the 
outgroup. Plants used for analysis: Agoponthus proecox, Arobidopsis tholiono, Asparagus officinalis, Bambusa oidhamii, Chrysanthemum morifolium, 
Crocus sativus, Dendrocalamus latiflorus, Elaeis guineensis, Epimedium sagittatum, Gerbera jamesonii. Glycine max, Gnetum parvifolium, Gossypium 
hirsutum, Hordeum vulgare, Hyacinthus orientalis, Lolium perenne, Lotus japonicus, Medicago truncatula, Oryza sativa. Petunia hybrida, Picea abies, 
Pisum sativum, Poa annua, Populus trichocarpa. Sorghum bicoior, Triticum aestivum, Vitis vinifera, Zea mays. 




analyses have revealed its roles in not only regulating 
root meristem cell proliferation but also flowering tran- 
sition [20,21]. Based on the soybean AGL12-LIKE ex- 
pression profile, it is tempting to speculate that their 
fijinctions in floral regulation may have been lost. A simi- 
lar expression pattern was observed for most MIKC^- 
genes clustered within a clade. All duplicated genes are 
transcribed and have comparable expression profiles, es- 
pecially in the reproductive SAM (Figure 3), suggesting 
their functional significance. 

As for the superclade consisting of AGL2, SQUA, and 
AGL6, there are some notable differences in the gene 
expression profiles among the three clades (Figure 3). 
For example, all AGL2-LIKE genes except one 
(Glyma01g08130) are absent from the SAM during 



the early floral transition process but are expressed 
later in the floral developmental process in the flower 
and pod. This pattern is expected as these genes are 
known to be activated following API induction in 
Arabidopsis [22]. The phylogenetic tree indicates that 
Glyma01g08130 is the counterpart for Arabidopsis SEP4, 
In addition to being found in flower and pod like the rest 
of the AGL2-LIKE genes, it is also expressed in the SAMs 
during the floral initiation process and very highly in nod- 
ules. This pattern implies a likely diverged function of 
GmSEP4 with additional roles in the early floral initiation 
process as well as in nodule formation. Glyma01g08150, 
one of the four soybean counterparts of Arabidopsis API, 
also likely plays a role in nodule formation. Although the 
expression of GlymaOlgOSlSO is drastically induced on 
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Figure 3 Expression profiles of soybean MIKC*^-type 
transcription factors. The highest expression level for each gene 
across two different sets of samples is given as an RPKM value (see 
Methods). The level of expression for a gene is represented as the 
percentage of the maximum expression level and colour coded 
from 0% (white) to 100% (black). S0-S4: samples derived from SAM 
after 10-day-old soybean plants were shifted to short-day growth 
conditions as described in the Methods at intervals of 0 short-day 
(SO), 1 short-day (SI), 2 short-day (S2), 3 short-day (S3) or 4 short-day 
(S4). Flw, flower; Pd, pod; L, leaf; Nd, nodule; R, root. AG-AGAMOUS, 
AGL2/6/12/15/17- AGAMOUS-LIKE2/6/12/15/17, SQUA-SQUAMOSA, 
FLC-FLOWERING LOCUS C, GGM13-Gnetum gnemon MADS box 
transcription factor! 3, StM A DSl 1- So/an L/m tuberosum MADSl 1, 
TM3- Tomato MADS box transcription factor3, 
DEFDEFICIENS, GLO:GLOBOSA. 



4SD in the SAM (20 RPKM), the level of expression is 
6-fold less than that in the nodule (134 RPKM; Figure 3). 
Intriguingly, its homeolog Glyma02gl3420 is not expres- 
sed in the nodule but rather has the highest expression in 
the reproductive SAM (105 RPKM; Figure 1 & 3), sug- 
gesting a functional divergence between this homeolog 
pair. 

Although members of soybean AGL6 genes are 
expressed in the reproductive SAMs, changes in their 
transcript levels are not comparable with those of 
PsMADSS-LIKE and SQUA-LIKE genes during the 
early floral transition process, suggesting that the lat- 
ter two clades are likely to play more prominent roles in 
the developmental transition process. Because there is no 
information available for PsMADSS-LIKE genes, we fo- 
cused our study on members of this novel sister clade. 

Expression of GmMADS3 during the floral initiation 
process 

We carried out RT-PCR analysis to verify the expression 
of this novel family in the soybean SAM during the early 
floral initiation process (Figure 4). RNA was extracted 
from dissected SAMs of plants undergoing 0, 2, 4, 6 and 
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Figure 4 RT-PCR analyses of soybean PsMADSS-LIKE and 
GmAPl sequences during early floral initiation processes. 
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10 SD treatments. RT-PCR was carried out with pri- 
mers specific to each of the soybean members in this 
PsMADSS branch (Glymal6g32540, GlymalOg38580, and 
Glyma20g29250) as well as to GmAPl (Glymal6gl3070) 
as the control for the floral induction process. Consistent 
with our previous study [23], the induction of GmAPl oc- 
curs after the 4 short-day treatment. All three soybean 
genes in the PsMADS3 clade displayed a similar expres- 
sion pattern to that of GmAPl, consistent with the in 
silico analysis. 

To examine the spatial expression pattern of these 
genes during the floral initiation process, in situ hy- 
bridisation analysis was performed. On 4SD, GmAPl ex- 
pression was detected in the incipient floral primordia of 
the inflorescence meristem (Figure 5a). On 6SD, newly 



established floral meristems became more prominent and 
GmAPl expression was detected throughout these meri- 
stems (Figure 5b). GmMADSSa (GlymalOg38580) exhi- 
bited a rather similar expression pattern to GmAPl on 
4SD (Figure 5d), but its expression subsequently spread to 
the entire inflorescence meristem as well as to the newly 
established floral meristems on 6SD (Figure 5e). As the 
GmMADSS homeolog pair is almost identical in nucleo- 
tide sequence including the UTR regions (data not 
shown), no gene-specific probe could be made, and thus 
the signals observed may correspond to the expression of 
both genes. Nevertheless, the expression of GmMADSS 
throughout the entire inflorescence and floral meristem 
suggests that it can serve as both an inflorescence and 
floral meristem identity gene. Because the expression of 
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Figure 5 Spatial expression pattern of soybean MIKC*^-type transcription factors, a & b. GmAPl (Glymal6g 13070) expression was first 
detected in tlie incipient floral primordia of the inflorescence meristem on 4 SD, which was then expressed throughout the newly established 
floral meristem on 6SD. d & e. GmMADSS (Glymal0g38580, Glyma20g29250) was expressed in the incipient floral primordia of the inflorescence 
meristem and subsequently in the entire inflorescence meristem as well as the newly established floral meristems on 6SD. g & h. GmMADSS-like 
gene (Glymal6g32540) was detected in the centre of the inflorescence and floral meristems. c, f & i. Sections were hybridised with the sense 
probe of the corresponding gene, as indicated. IM: Inflorescence meristem. 
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GmMADSS initially overlaps with that of GmAPl, it likely 
performs similar functions as GmAPl, Its subsequent 
widespread expression in the inflorescence meristem may 
ensure all vegetative activities at the SAM are replaced 
with the initiation of the floral meristem at the meristem 
flanks. 

The expression of Glymal6g32540 is distinct from that 
of GmAPl and GmMADSS, A weak signal associated 
with its expression was detected in the centre of the 
inflorescence meristem (Figure 5g); on 6SD, its expres- 
sion was also observed in the centre of the newly 
emerged floral meristem (Figure 5h). The expression of 
Glymal6g32540 in the centre of the meristem indicates 
its potential regulatory roles in orchestrating events in 
the inflorescence meristem. The spatial expression pat- 
tern of the soybean PsMADSS-LIKE genes supports that 
these genes are novel, as their expression differs mark- 
edly from the spatial expression of closely related family 
members such as GmAPl (this study) or Arabidopsis 
AGL6 [24]. 

Conclusions 

In contrast to Arabidopsis where the initiation timing of 
floral whorls does not overlap, the legume soybean has a 
flower development system with overlapping whorls 
[25]. Furthermore, unlike Arabidopsis that usually can- 
not undergo flowering reversion [26], the soybean in- 
florescent meristem can revert to leaf production when 
the environmental growth conditions are switched from 
SD to LD [27]. Because the MIKC^-type MADS-box 
genes play key regulatory roles in different stages of 
flower development, it is conceivable that members of 
the PsMADSS sub-clade identified in this study could 
contribute to developmental plasticity in cooperation 
with key floral regulators such as GmAPl or GmFLC. 
Future studies aimed at defining the interacting partners 
of these genes will aid in our understanding of the floral 
transition process. 

Methods 

Sequence and phylogenetic analysis 

Conceptually translated protein sequences were re- 
trieved from public databases (Phytozome, Rice Genome 
Annotation Project, TAIR and LjGDB). For the initial 
identification of the soybean MIKC^-type MADS-box 
transcription factors, all annotated genes were screened 
for both the MADS-box domain (PFAM00319) and K- 
domain (PFAM01486). The results were then manually 
inspected and filtered for truncated protein sequences, 
resulting in a total of 54 sequences (Additional file 1: 
Table SI). Sequences were imported into MEGA version 
5 software for subsequent phylogenetic and molecular 
evolutionary analyses [28]. MUSCLE alignments of 
protein sequences spanning the MADS-, I- and K- 



domains were carried out using the default settings in 
MEGA. After alignment, the evolutionary history was esti- 
mated using the Maximum Likelihood method based on 
the JTT matrix-based model as implemented in Mega 5 
with bootstrap analysis set at 200 replicates. 

For the expression profile analysis, two separate tran- 
scriptome sequencing data were used [19,29]. The abun- 
dance for each gene was normalised within each dataset 
and expressed in reads per kb per million (RPKM) and 
are provided in Additional file 1. 

Plant growth and RNA extraction 

Soybean plants [Glycine max. (L) Merr. Cv. Bragg] were 
grown in a greenhouse located at the University of Mel- 
bourne, Victoria, Australia. To induce flowering, 10-day- 
old plants were shifted to a growth chamber maintained 
at a constant temperature of 25°C with a 10-hr day 
(150 (imol s'^) and 14-hr night (short-day). Shoot 
apical meristems (SAMs) were micro-dissected, as previ- 
ously described [24]. Total RNA was extracted from dis- 
sected SAM (approximately 80 SAMs per extraction) 
using the Qiagen RNeasy Mini Kit (Qiagen, Victoria, 
Australia) with on-column DNAse digestion. 

RT-PCR analysis 

The Qiagen one-step RT-PCR kit was used according to 
the manufacturers instructions in all RT-PCR analyses. 
Total RNA (20 ng) isolated from the SAMs of 10-day- 
old soybean seedlings (0 SD) and from the SAMs of 
plants subjected to different short-day treatments (2, 4 
or 6 SD) was used as the template in a 10- (il reaction 
volume for 25 amplification cycles. Primers used are: 

Glymal0g38580F CAACTGGATAGAACGCTTGCA 
CAAG 

Glymal0g38580R CATCAATGGACGCTTAACG 
TACTATATAGC 

Glymal6g32540F CTTGAGCTGACACAAAGGCA 
Glymal6g32540R GCTTTGACTACCGTCTGTCTTG 
Glyma20g29250F AGCTCGGAAGCACCTAACG 
Glyma20g29250R CATCAATGGACCCTCAAC 
TATAGC 

Glymal6gl3070F GCCTCAAAGAGCTTCAGAG 
TCTGGAGC 

Glymal6gl3070R AGAAAGCCTAGCCTTGTGACCA 
ACTINF ATCATGTTTGAGACCTTCAATGTG 
ACTINR CTCGAGTTCTTGCTCATAATCTAGG. 

The soybean actin gene was used as an internal con- 
trol. The PCR reactions were separated on 1% agarose gels 
containing 0.1 (ig/(il ethidium bromide and visualised 
under UV light. 
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RNA in situ hybridisation 

The soybean shoot apices were dissected and fixed 
with 4% paraformaldehyde (Sigma, Victoria, AustraUa) 
in phosphate-buffered saline overnight at 4°C after vac- 
uum infiltration. Subsequent fixation and hybridisation 
steps were followed as previously described [13]. 

Additional file 



Additional file 1: Table SI. Expression levels of soybean MIKC^-type 
transcription factors in different tissues. 
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